{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "31b8944e-8163-4fcb-bd2b-5ae9935a913e",
   "metadata": {},
   "source": [
    "# BERT命名实体识别NER"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62754378-3a58-4c65-8017-f19a78554206",
   "metadata": {},
   "source": [
    "😋😋公众号算法美食屋后台回复关键词：**torchkeras**，获取本文notebook源代码和数据集下载链接。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c24ed255-7add-4184-8033-6e13ff802f62",
   "metadata": {},
   "source": [
    "命名实体识别NER任务是NLP的一个常见任务，\n",
    "\n",
    "它是Named Entity Recognization的简称。\n",
    "\n",
    "简单地说，就是识别一个句子中的各种 名称实体，诸如：人名，地名，机构 等。\n",
    "\n",
    "例如对于下面这句话：\n",
    "\n",
    "```\n",
    "小明对小红说:\"你听说过安利吗？\"\n",
    "```\n",
    "\n",
    "它的NER抽取结果如下:\n",
    "\n",
    "```\n",
    "[{'entity': 'person',\n",
    "  'word': '小明',\n",
    "  'start': 0,\n",
    "  'end': 2},\n",
    " {'entity': 'person',\n",
    "  'word': '小红',\n",
    "  'start': 3,\n",
    "  'end': 5},\n",
    " {'entity': 'organization',\n",
    "  'word': '安利',\n",
    "  'start': 12,\n",
    "  'end': 14}】\n",
    "\n",
    "```\n",
    "\n",
    "本质上NER是一个token classification任务， 需要把文本中的每一个token做一个分类。\n",
    "\n",
    "那些不是命名实体的token，一般用大'O'表示。\n",
    "\n",
    "值得注意的是，由于有些命名实体是由连续的多个token构成的，为了避免有两个连续的相同的命名实体无法区分，需要对token是否处于命名实体的开头进行区分。\n",
    "\n",
    "例如，对于 '我爱北京天安门' 这句话。如果我们不区分token是否为命名实体的开头的话，可能会得到这样的token分类结果。\n",
    "\n",
    "'''\n",
    "我(O) 爱(O) 北(Loc) 京(Loc) 天(Loc) 安(Loc) 门(Loc)\n",
    "'''\n",
    "\n",
    "然后我们做后处理的时候，把类别相同的token连起来，会得到一个location实体 '北京天安门'。\n",
    "\n",
    "\n",
    "但是，’北京‘ 和 ’天安门‘ 是两个不同的location实体，把它们区分开来更加合理一些. 因此我们可以这样对token进行分类。\n",
    "\n",
    "我(O) 爱(O) 北(B-Loc) 京(I-Loc) 天(B-Loc) 安(I-Loc) 门(I-Loc)\n",
    "\n",
    "我们用 B-Loc表示这个token是一个location实体的开始token，用I-Loc表示这个token是一个location实体的内部(包括中间以及结尾)token. 这样，我们做后处理的时候，就可以把 B-loc以及它后面的 I-loc连成一个实体。这样就可以得到’北京‘ 和 ’天安门‘ 是两个不同的location的结果了。\n",
    "\n",
    "区分token是否是entity开头的好处是我们可以把连续的同一类别的的命名实体进行区分，坏处是分类数量会几乎翻倍(n+1->2n+1)。\n",
    "在许多情况下，出现这种连续的同命名实体并不常见，但为了稳妥起见，区分token是否是entity开头还是十分必要的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac22c0f6-a9f1-4120-afa1-cff2ab1c2455",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "30575a92-6e23-43f4-aad2-8a713605f285",
   "metadata": {},
   "source": [
    "## 一，准备数据"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec0f8980-0a16-4477-ae25-071b92af3db3",
   "metadata": {},
   "source": [
    "公众号算法美食屋后台回复关键词：torchkeras，获取本文notebook代码和车道线数据集下载链接。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a62aa31e-a84a-4a18-ba8d-191216e51e59",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np \n",
    "import pandas as pd \n",
    "\n",
    "from transformers import BertTokenizer\n",
    "from torch.utils.data import DataLoader,Dataset \n",
    "from transformers import DataCollatorForTokenClassification\n",
    "import datasets  \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a66ee482-62e2-4c27-ad98-5887463bff3e",
   "metadata": {},
   "source": [
    "### 1，数据加载"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2b918503-7c22-49dd-aa2e-d86fb7df3986",
   "metadata": {},
   "outputs": [],
   "source": [
    "datadir = \"./data/cluener_public/\"\n",
    "\n",
    "train_path = datadir+\"train.json\"\n",
    "val_path = datadir+\"dev.json\"\n",
    "\n",
    "dftrain = pd.read_json(train_path,lines=True)\n",
    "dfval = pd.read_json(train_path,lines=True)\n",
    "\n",
    "entities = ['address','book','company','game','government','movie',\n",
    "              'name','organization','position','scene']\n",
    "\n",
    "label_names = ['O']+['B-'+x for x in entities]+['I-'+x for x in entities]\n",
    "\n",
    "id2label = {i: label for i, label in enumerate(label_names)}\n",
    "label2id = {v: k for k, v in id2label.items()}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9077c987-edee-492f-9885-d9b32bc30da4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "世上或许有两个人并不那么喜欢LewisCarroll的原著小说《爱丽斯梦游奇境》(\n",
      "{'book': {'《爱丽斯梦游奇境》': [[31, 39]]}, 'name': {'LewisCarroll': [[14, 25]]}}\n"
     ]
    }
   ],
   "source": [
    "text = dftrain[\"text\"][43]\n",
    "label = dftrain[\"label\"][43]\n",
    "\n",
    "print(text)\n",
    "print(label)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9927db52-81ad-448c-bd5a-8e359492a01c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ad34f5b4-6e46-49ec-b14c-018a207bddc7",
   "metadata": {},
   "source": [
    "### 2，文本分词"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0aac2f1f-66f5-4ae0-ae81-b499d555eeae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BertTokenizer(name_or_path='bert-base-chinese', vocab_size=21128, model_max_length=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True)\n"
     ]
    }
   ],
   "source": [
    "from transformers import BertTokenizer\n",
    "model_name = 'bert-base-chinese'\n",
    "tokenizer = BertTokenizer.from_pretrained(model_name) \n",
    "print(tokenizer) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "543d2714-bea4-445e-91dd-09d1a3d5600b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[101, 686, 677, 2772, 6387, 3300, 697, 702, 782, 2400, 679, 6929, 720, 1599, 3614, 100, 4638, 1333, 5865, 2207, 6432, 517, 4263, 714, 3172, 3457, 3952, 1936, 1862, 518, 113, 102]\n"
     ]
    }
   ],
   "source": [
    "tokenized_input = tokenizer(text)\n",
    "print(tokenized_input[\"input_ids\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0b83a262-8ea2-4543-b31f-81b34f242576",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[CLS]\n",
      "世\n",
      "上\n",
      "或\n",
      "许\n",
      "有\n",
      "两\n",
      "个\n",
      "人\n",
      "并\n",
      "不\n",
      "那\n",
      "么\n",
      "喜\n",
      "欢\n",
      "[UNK]\n",
      "的\n",
      "原\n",
      "著\n",
      "小\n",
      "说\n",
      "《\n",
      "爱\n",
      "丽\n",
      "斯\n",
      "梦\n",
      "游\n",
      "奇\n",
      "境\n",
      "》\n",
      "(\n",
      "[SEP]\n"
     ]
    }
   ],
   "source": [
    "#可以从id还原每个token对应的字符组合\n",
    "tokens = tokenizer.convert_ids_to_tokens(tokenized_input[\"input_ids\"])\n",
    "for t in tokens:\n",
    "    print(t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af4b2324-1fbc-4037-b53d-95d9026ad24f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e0704f41-daed-4a06-aea3-b57c8853c630",
   "metadata": {},
   "source": [
    "### 3，标签对齐"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e8f61c4-ea7c-4931-9f3f-7b88092aeda2",
   "metadata": {},
   "source": [
    "可以看到，经过文本分词后的token长度与文本长度并不相同，\n",
    "\n",
    "主要有以下一些原因导致：一是BERT分词后会增加一些特殊字符如 `[CLS]`,`[SEP]`\n",
    "\n",
    "二是，还会有一些英文单词的subword作为一个 token. (如这个例子中的 'charles')，\n",
    "\n",
    "此外，还有一些未在词典中的元素被标记为`[UNK]`会造成影响。\n",
    "\n",
    "因此需要给这些token赋予正确的label不是一个容易的事情。\n",
    "\n",
    "我们分两步走，第一步，把原始的dict形式的label转换成字符级别的char_label\n",
    "\n",
    "第二步，再将char_label对齐到token_label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0ae31f51-a48a-48bd-92a6-6b4e587f1acd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 把 label格式转化成字符级别的char_label\n",
    "def get_char_label(text,label):\n",
    "    char_label = ['O' for x in text]\n",
    "    for tp,dic in label.items():\n",
    "        for word,idxs in dic.items():\n",
    "            idx_start = idxs[0][0]\n",
    "            idx_end = idxs[0][1]\n",
    "            char_label[idx_start] = 'B-'+tp\n",
    "            char_label[idx_start+1:idx_end+1] = ['I-'+tp for x in range(idx_start+1,idx_end+1)]\n",
    "    return char_label \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3f58b7aa-5af5-4b7a-8d31-bf5a24b782da",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "世\tO\n",
      "上\tO\n",
      "或\tO\n",
      "许\tO\n",
      "有\tO\n",
      "两\tO\n",
      "个\tO\n",
      "人\tO\n",
      "并\tO\n",
      "不\tO\n",
      "那\tO\n",
      "么\tO\n",
      "喜\tO\n",
      "欢\tO\n",
      "L\tB-name\n",
      "e\tI-name\n",
      "w\tI-name\n",
      "i\tI-name\n",
      "s\tI-name\n",
      "C\tI-name\n",
      "a\tI-name\n",
      "r\tI-name\n",
      "r\tI-name\n",
      "o\tI-name\n",
      "l\tI-name\n",
      "l\tI-name\n",
      "的\tO\n",
      "原\tO\n",
      "著\tO\n",
      "小\tO\n",
      "说\tO\n",
      "《\tB-book\n",
      "爱\tI-book\n",
      "丽\tI-book\n",
      "斯\tI-book\n",
      "梦\tI-book\n",
      "游\tI-book\n",
      "奇\tI-book\n",
      "境\tI-book\n",
      "》\tI-book\n",
      "(\tO\n"
     ]
    }
   ],
   "source": [
    "char_label = get_char_label(text,label)\n",
    "for char,char_tp in zip(text,char_label):\n",
    "    print(char+'\\t'+char_tp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "01168cfa-209f-4859-ad95-e79900d58aa3",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_token_label(text, char_label, tokenizer):\n",
    "    tokenized_input = tokenizer(text)\n",
    "    tokens = tokenizer.convert_ids_to_tokens(tokenized_input[\"input_ids\"])\n",
    "    \n",
    "    iter_tokens = iter(tokens)\n",
    "    iter_char_label = iter(char_label)  \n",
    "    iter_text = iter(text.lower()) \n",
    "\n",
    "    token_labels = []\n",
    "\n",
    "    t = next(iter_tokens)\n",
    "    char = next(iter_text)\n",
    "    char_tp = next(iter_char_label)\n",
    "\n",
    "    while True:\n",
    "        #单个字符token(如汉字)直接赋给对应字符token\n",
    "        if len(t)==1:\n",
    "            assert t==char\n",
    "            token_labels.append(char_tp)   \n",
    "            try:\n",
    "                char = next(iter_text)\n",
    "                char_tp = next(iter_char_label)\n",
    "            except StopIteration:\n",
    "                pass  \n",
    "\n",
    "        #添加的特殊token如[CLS],[SEP],排除[UNK]\n",
    "        elif t in tokenizer.special_tokens_map.values() and t!='[UNK]':\n",
    "            token_labels.append('O')              \n",
    "\n",
    "\n",
    "        elif t=='[UNK]':\n",
    "            token_labels.append(char_tp) \n",
    "            #重新对齐\n",
    "            try:\n",
    "                t = next(iter_tokens)\n",
    "            except StopIteration:\n",
    "                break \n",
    "\n",
    "            if t not in tokenizer.special_tokens_map.values():\n",
    "                while char!=t[0]:\n",
    "                    try:\n",
    "                        char = next(iter_text)\n",
    "                        char_tp = next(iter_char_label)\n",
    "                    except StopIteration:\n",
    "                        pass    \n",
    "            continue\n",
    "\n",
    "        #其它长度大于1的token，如英文token\n",
    "        else:\n",
    "            t_label = char_tp\n",
    "            t = t.replace('##','') #移除因为subword引入的'##'符号\n",
    "            for c in t:\n",
    "                assert c==char or char not in tokenizer.vocab\n",
    "                if t_label!='O':\n",
    "                    t_label=char_tp\n",
    "                try:\n",
    "                    char = next(iter_text)\n",
    "                    char_tp = next(iter_char_label)\n",
    "                except StopIteration:\n",
    "                    pass    \n",
    "            token_labels.append(t_label) \n",
    "\n",
    "        try:\n",
    "            t = next(iter_tokens)\n",
    "        except StopIteration:\n",
    "            break  \n",
    "            \n",
    "    assert len(token_labels)==len(tokens)\n",
    "    return token_labels \n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8130a474-755c-4721-808f-32b004a27a20",
   "metadata": {},
   "outputs": [],
   "source": [
    "token_labels = get_token_label(text,char_label,tokenizer)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e9b3b41b-c73e-4bc9-893d-87396ba18768",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[CLS] \t O\n",
      "世 \t O\n",
      "上 \t O\n",
      "或 \t O\n",
      "许 \t O\n",
      "有 \t O\n",
      "两 \t O\n",
      "个 \t O\n",
      "人 \t O\n",
      "并 \t O\n",
      "不 \t O\n",
      "那 \t O\n",
      "么 \t O\n",
      "喜 \t O\n",
      "欢 \t O\n",
      "[UNK] \t B-name\n",
      "的 \t O\n",
      "原 \t O\n",
      "著 \t O\n",
      "小 \t O\n",
      "说 \t O\n",
      "《 \t B-book\n",
      "爱 \t I-book\n",
      "丽 \t I-book\n",
      "斯 \t I-book\n",
      "梦 \t I-book\n",
      "游 \t I-book\n",
      "奇 \t I-book\n",
      "境 \t I-book\n",
      "》 \t I-book\n",
      "( \t O\n",
      "[SEP] \t O\n"
     ]
    }
   ],
   "source": [
    "for t,t_label in zip(tokens,token_labels):\n",
    "    print(t,'\\t',t_label)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8805c17-eb3e-4b01-a6ec-43db9c62dea0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a2312b15-1f8b-4615-b9a4-4e1e2a50e1ea",
   "metadata": {},
   "source": [
    "### 4，构建管道"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "cf94d599-6f94-46db-9f28-cd8556043601",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>浙商银行企业信贷部叶老桂博士则从另一个角度对五道门槛进行了解读。叶老桂认为，对目前国内商业银...</td>\n",
       "      <td>{'name': {'叶老桂': [[9, 11]]}, 'company': {'浙商银行...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>生生不息CSOL生化狂潮让你填弹狂扫</td>\n",
       "      <td>{'game': {'CSOL': [[4, 7]]}}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>那不勒斯vs锡耶纳以及桑普vs热那亚之上呢？</td>\n",
       "      <td>{'organization': {'那不勒斯': [[0, 3]], '锡耶纳': [[6...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>加勒比海盗3：世界尽头》的去年同期成绩死死甩在身后，后者则即将赶超《变形金刚》，</td>\n",
       "      <td>{'movie': {'加勒比海盗3：世界尽头》': [[0, 11]], '《变形金刚》'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>布鲁京斯研究所桑顿中国中心研究部主任李成说，东亚的和平与安全，是美国的“核心利益”之一。</td>\n",
       "      <td>{'address': {'美国': [[32, 33]]}, 'organization'...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text  \\\n",
       "0  浙商银行企业信贷部叶老桂博士则从另一个角度对五道门槛进行了解读。叶老桂认为，对目前国内商业银...   \n",
       "1                                 生生不息CSOL生化狂潮让你填弹狂扫   \n",
       "2                             那不勒斯vs锡耶纳以及桑普vs热那亚之上呢？   \n",
       "3           加勒比海盗3：世界尽头》的去年同期成绩死死甩在身后，后者则即将赶超《变形金刚》，   \n",
       "4       布鲁京斯研究所桑顿中国中心研究部主任李成说，东亚的和平与安全，是美国的“核心利益”之一。   \n",
       "\n",
       "                                               label  \n",
       "0  {'name': {'叶老桂': [[9, 11]]}, 'company': {'浙商银行...  \n",
       "1                       {'game': {'CSOL': [[4, 7]]}}  \n",
       "2  {'organization': {'那不勒斯': [[0, 3]], '锡耶纳': [[6...  \n",
       "3  {'movie': {'加勒比海盗3：世界尽头》': [[0, 11]], '《变形金刚》'...  \n",
       "4  {'address': {'美国': [[32, 33]]}, 'organization'...  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dftrain.head() "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "287408f1-79b8-4456-ba65-0d006cdb2e3d",
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_sample(text,label,tokenizer):\n",
    "    sample = tokenizer(text)\n",
    "    char_label = get_char_label(text,label)\n",
    "    token_label = get_token_label(text,char_label,tokenizer)\n",
    "    sample['labels'] = [label2id[x] for x in token_label]\n",
    "    return sample "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "f8881334-399d-4339-867a-d50ca1809bbc",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10748/10748 [00:06<00:00, 1717.47it/s]\n",
      "100%|██████████| 10748/10748 [00:06<00:00, 1711.10it/s]\n"
     ]
    }
   ],
   "source": [
    "from tqdm import tqdm \n",
    "train_samples = [make_sample(text,label,tokenizer) for text,label in \n",
    "                 tqdm(list(zip(dftrain['text'],dftrain['label'])))]\n",
    "val_samples = [make_sample(text,label,tokenizer) for text,label in \n",
    "                 tqdm(list(zip(dfval['text'],dfval['label'])))]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "1c71aeed-22ed-4064-a8fc-f79784a6b819",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_train = datasets.Dataset.from_list(train_samples)\n",
    "ds_val = datasets.Dataset.from_list(val_samples)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "55021f14-029c-4fa9-ae18-b668d1450d12",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)\n",
    "dl_train = DataLoader(ds_train,batch_size=8,collate_fn=data_collator)\n",
    "dl_val = DataLoader(ds_val,batch_size=8,collate_fn=data_collator)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "id": "5130a81d-4af5-4895-b2d0-9f7e090d8316",
   "metadata": {},
   "outputs": [],
   "source": [
    "for batch in dl_train:\n",
    "    break "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74b20fe0-df1e-4b31-9640-3362c12d3a8f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c646df44-2f76-4383-a6b9-8cefa9972f69",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "f1b7953f-ed3d-463f-ba1d-b5a9209592c3",
   "metadata": {},
   "source": [
    "## 二，定义模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "id": "e5404c3e-e8e0-445a-9231-84fd4aeb2c9a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some weights of the model checkpoint at bert-base-chinese were not used when initializing BertForTokenClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']\n",
      "- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-chinese and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "21\n",
      "tensor(3.0772, grad_fn=<NllLossBackward0>)\n",
      "torch.Size([8, 52, 21])\n"
     ]
    }
   ],
   "source": [
    "from transformers import BertForTokenClassification\n",
    "\n",
    "net = BertForTokenClassification.from_pretrained(\n",
    "    model_name,\n",
    "    id2label=id2label,\n",
    "    label2id=label2id,\n",
    ")\n",
    "\n",
    "#冻结bert基模型参数\n",
    "for para in net.bert.parameters():\n",
    "    para.requires_grad_(False)\n",
    "\n",
    "print(net.config.num_labels) \n",
    "\n",
    "#模型试算\n",
    "out = net(**batch)\n",
    "print(out.loss) \n",
    "print(out.logits.shape)  \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e1b12a5-81db-4d76-b523-c478a5b3ccaf",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80be6739-d40f-4247-838f-594b6e14b349",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "05e048b7-d5de-47bf-bfa0-bef80212767f",
   "metadata": {},
   "source": [
    "## 三，训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "id": "8490a0a4-c941-4adc-89f6-bf248e59c908",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch \n",
    "from torchkeras import KerasModel \n",
    "\n",
    "#我们需要修改StepRunner以适应transformers的数据集格式\n",
    "\n",
    "class StepRunner:\n",
    "    def __init__(self, net, loss_fn, accelerator, stage = \"train\", metrics_dict = None, \n",
    "                 optimizer = None, lr_scheduler = None\n",
    "                 ):\n",
    "        self.net,self.loss_fn,self.metrics_dict,self.stage = net,loss_fn,metrics_dict,stage\n",
    "        self.optimizer,self.lr_scheduler = optimizer,lr_scheduler\n",
    "        self.accelerator = accelerator\n",
    "        if self.stage=='train':\n",
    "            self.net.train() \n",
    "        else:\n",
    "            self.net.eval()\n",
    "    \n",
    "    def __call__(self, batch):\n",
    "        \n",
    "        out = self.net(**batch)\n",
    "        \n",
    "        #loss\n",
    "        loss= out.loss\n",
    "        \n",
    "        #preds\n",
    "        preds =(out.logits).argmax(axis=2) \n",
    "    \n",
    "        #backward()\n",
    "        if self.optimizer is not None and self.stage==\"train\":\n",
    "            self.accelerator.backward(loss)\n",
    "            self.optimizer.step()\n",
    "            if self.lr_scheduler is not None:\n",
    "                self.lr_scheduler.step()\n",
    "            self.optimizer.zero_grad()\n",
    "        \n",
    "        all_loss = self.accelerator.gather(loss).sum()\n",
    "        \n",
    "        labels = batch['labels']\n",
    "        \n",
    "        #precision & recall\n",
    "        \n",
    "        precision =  (((preds>0)&(preds==labels)).sum())/(\n",
    "            torch.maximum((preds>0).sum(),torch.tensor(1.0).to(preds.device)))\n",
    "        recall =  (((labels>0)&(preds==labels)).sum())/(\n",
    "            torch.maximum((labels>0).sum(),torch.tensor(1.0).to(labels.device)))\n",
    "    \n",
    "        \n",
    "        all_precision = self.accelerator.gather(precision).mean()\n",
    "        all_recall = self.accelerator.gather(recall).mean()\n",
    "        \n",
    "        f1 = 2*all_precision*all_recall/torch.maximum(\n",
    "            all_recall+all_precision,torch.tensor(1.0).to(labels.device))\n",
    "        \n",
    "        #losses\n",
    "        step_losses = {self.stage+\"_loss\":all_loss.item(), \n",
    "                       self.stage+'_precision':all_precision.item(),\n",
    "                       self.stage+'_recall':all_recall.item(),\n",
    "                       self.stage+'_f1':f1.item()\n",
    "                      }\n",
    "        \n",
    "        #metrics\n",
    "        step_metrics = {}\n",
    "        \n",
    "        if self.stage==\"train\":\n",
    "            if self.optimizer is not None:\n",
    "                step_metrics['lr'] = self.optimizer.state_dict()['param_groups'][0]['lr']\n",
    "            else:\n",
    "                step_metrics['lr'] = 0.0\n",
    "        return step_losses,step_metrics\n",
    "    \n",
    "KerasModel.StepRunner = StepRunner \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "id": "3d3c2df1-f614-4efb-bf48-33dc8e0d64f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = torch.optim.AdamW(net.parameters(), lr=3e-5)\n",
    "\n",
    "keras_model = KerasModel(net,\n",
    "                   loss_fn=None,\n",
    "                   optimizer = optimizer\n",
    "                   )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c811af7a-4bb0-4376-bc8b-e7427e55c52b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "id": "c61dafd7-9c68-4577-915a-383e289f231b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;31m<<<<<< ⚡️ cuda is used >>>>>>\u001b[0m\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiEAAAGJCAYAAABcsOOZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABLAUlEQVR4nO3deVxU5eIG8GcYYEBkEZBFREClXMIlFa+7FqXpdc0td/1pm7mmV83cU7SupqZm3mtW5m5UlqUpoTd3EzFJRU3cWVxBURZn3t8fpxkYGGCGWQ6Mz/fzOR/gPe855z3DmTnPec8yCiGEABEREZGNOcjdACIiIno6MYQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWTCEEFnZ7NmzoVAocPv2bbmbYjOXL1+GQqHAF198YdJ0aWlp6N27N3x8fKBQKLB06VKrtI+IygeGECI7tWDBAnz33XdyN8MkEyZMwO7duzFt2jSsX78enTp1AgDMnz8f3bp1g7+/PxQKBWbPni1bGzUaDT788EOEhYXBxcUFDRo0wKZNm4yatn379lAoFAYHJycnXb07d+7go48+Qtu2bVG1alV4eXnhH//4B7Zs2VJknvv27St2nkeOHClSPzc3FwsWLECdOnXg4uICf39/dOnSBdevXy/7i0JURo5yN4CIrGPBggXo3bs3evToIXdTjPbrr7+ie/fumDRpkl75+++/j4CAADRu3Bi7d++WqXWS6dOnY+HChRg1ahSaNWuG77//HgMGDIBCoUD//v1LnXbkyJF6ZVlZWXjzzTfx8ssv68oOHz6M6dOno3Pnznj//ffh6OiIb775Bv3798eZM2cwZ86cIvMeO3YsmjVrpldWu3Ztvb/z8vLQpUsXHDp0CKNGjUKDBg1w7949HD16FBkZGahevbqpLweRWRhCiKjcSE9Ph5eXV5Hy5ORkhIaG4vbt26hatartG/a3GzduYPHixRg9ejRWrFgBABg5ciTatWuHyZMno0+fPlAqlcVO/9JLLxUp+/rrrwEAAwcO1JXVr18fFy5cQEhIiK7s7bffRlRUFBYtWoR//etfcHNz05tPmzZt0Lt37xLb//HHH2P//v04cOAAIiMjS19hIivj6RgiG7l9+zb69u0LDw8P+Pj4YNy4ccjOzi5S7+uvv0aTJk3g6uoKb29v9O/fH9euXdOrc+HCBbz66qsICAiAi4sLqlevjv79+yMjIwMAoFAokJWVhS+//FLXNT9s2DCD7UpLS4Ojo6PBo+ukpCQoFArdDvfu3buYNGkSIiIiULlyZXh4eOCVV17BqVOnzHptvvjiCygUCgghsHLlSl2btUJDQ82av6V8//33yMvLw9tvv60rUygUeOutt3D9+nUcPnzY5Hlu3LgRbm5u6N69u64sLCxML4Bol9OjRw/k5OTg0qVLBuf14MEDPHnyxOA4jUaDZcuWoWfPnoiMjMSTJ0/w6NEjk9tLZEkMIUQ20rdvX2RnZyM6OhqdO3fG8uXL8frrr+vVmT9/PoYMGYLw8HAsWbIE48ePR2xsLNq2bYv79+8DkM7pd+zYEUeOHMGYMWOwcuVKvP7667h06ZKuzvr166FSqdCmTRusX78e69evxxtvvGGwXf7+/mjXrh22bt1aZNyWLVugVCrRp08fAMClS5fw3Xff4Z///CeWLFmCyZMn4/Tp02jXrh1u3rxZ5tembdu2WL9+PQCpt0DbZkvIy8vD7du3jRo0Gk2J8zp58iTc3NxQt25dvXJtr8LJkydNatutW7ewZ88e9OjRo0jPhiGpqakAAF9f3yLjhg8fDg8PD7i4uKBDhw74/fff9cafOXMGN2/eRIMGDfD666/Dzc0Nbm5uaNCgAeLi4kxqN5HFCCKyqlmzZgkAolu3bnrlb7/9tgAgTp06JYQQ4vLly0KpVIr58+fr1Tt9+rRwdHTUlZ88eVIAENu2bStxuW5ubmLo0KFGtfGzzz4TAMTp06f1yuvVqydeeOEF3d/Z2dlCrVbr1UlOThYqlUrMnTtXrwyAWLdunVHL1wIgRo8eXez4W7duCQBi1qxZRs8zLi5OADBqSE5OLnFeXbp0ETVr1ixSnpWVJQCIqVOnGt0uIYT45JNPBADx008/lVr3zp07ws/PT7Rp00av/ODBg+LVV18Va9euFd9//72Ijo4WPj4+wsXFRcTHx+vqxcTECADCx8dHhIeHi3Xr1ol169aJ8PBw4ezsrNsOiWyJ14QQ2cjo0aP1/h4zZgxWrVqFn376CQ0aNEBMTAw0Gg369u2rdztvQEAAwsPDERcXh/feew+enp4AgN27d6Nz586oVKmS2W3r1asXRo8ejS1btuC5554DACQmJuLMmTMYN26crp5KpdL9rlarcf/+fVSuXBnPPvss4uPjzW6HNTRs2BB79uwxqm5AQECJ4x8/fqz3Gmi5uLjoxpti48aNqFq1qsFrRQrSaDQYOHAg7t+/j08++URvXMuWLdGyZUvd3926dUPv3r3RoEEDTJs2Dbt27QIAPHz4EIB0yubkyZMIDg4GALzwwguoXbs2PvzwQ931KUS2whBCZCPh4eF6f9eqVQsODg64fPkyAOk6DyFEkXpa2ls4w8LCMHHiRCxZsgQbNmxAmzZt0K1bNwwaNEgXUEzl6+uLF198EVu3bsW8efMASKdiHB0d0atXL1097XUFq1atQnJyMtRqtW6cj49PmZZtbVWqVEFUVJRF5uXq6oqcnJwi5dpre1xdXY2e16VLl3D48GG88847cHQs+aN4zJgx2LVrF7766is0bNiw1HnXrl0b3bt3R0xMDNRqNZRKpa5trVq10gUQAKhRowZat26NQ4cOGd12IkthCCGSScELLwFpB69QKPDzzz8bvMOicuXKut8XL16MYcOG4fvvv8cvv/yCsWPHIjo6GkeOHCnzbZb9+/fH8OHDkZCQgEaNGmHr1q148cUX9a4/WLBgAWbMmIERI0Zg3rx58Pb2hoODA8aPH1/q9RRyyc3Nxd27d42qW7Vq1RLvbgkMDERcXByEEHr/v5SUFABAtWrVjG7Xxo0bAejfFWPInDlzsGrVKixcuBCDBw82ev7BwcHIzc1FVlYWPDw8dG3z9/cvUtfPz8/k61mILIEhhMhGLly4gLCwMN3fFy9ehEaj0d35UatWLQghEBYWhmeeeabU+UVERCAiIgLvv/8+Dh06hFatWmH16tX44IMPABQNOaXp0aMH3njjDd0Dsc6fP49p06bp1dm+fTs6dOiAtWvX6pXfv3/f4MWS5cGhQ4fQoUMHo+pqbwUuTqNGjfDf//4XZ8+eRb169XTlR48e1Y031saNG1GrVi384x//KLbOypUrMXv2bIwfPx5Tpkwxet6A1NPi4uKiC68RERFwcnLCjRs3itS9efOmrLc+09OLd8cQ2cjKlSv1/tae23/llVcASNdlKJVKzJkzB0IIvbpCCNy5cwcAkJmZWeQ2zIiICDg4OOidKnBzc9PdLWMMLy8vdOzYEVu3bsXmzZvh7Oxc5EFnSqWySNu2bdtmcMdWXmivCTFmKO2akO7du8PJyQmrVq3SlQkhsHr1agQFBeldm5GSkoJz584hLy+vyHxOnjyJs2fPYsCAAcUua8uWLRg7diwGDhyIJUuWFFvv1q1bRcpOnTqFHTt24OWXX4aDg/Qx7+7ujs6dO+PQoUM4d+6cru7Zs2dx6NChUq9LIbIG9oQQ2UhycjK6deuGTp064fDhw/j6668xYMAA3Tn+WrVq4YMPPsC0adNw+fJl9OjRA+7u7khOTsa3336L119/HZMmTcKvv/6Kd955B3369MEzzzyDJ0+eYP369VAqlXj11Vd1y2vSpAn27t2LJUuWoFq1aggLC0Pz5s1LbGO/fv0waNAgrFq1Ch07dizy4LB//vOfmDt3LoYPH46WLVvi9OnT2LBhA2rWrGnx16ug9evX48qVK7rnWvzvf//T9fgMHjy4yDM1CrLkNSHVq1fH+PHj8dFHHyEvLw/NmjXDd999h99++w0bNmzQO5Uzbdo0fPnllwZ7VzZs2ACg+FMxx44dw5AhQ+Dj44MXX3xRV1+rZcuWute8X79+cHV1RcuWLeHn54czZ85gzZo1qFSpEhYuXKg33YIFCxAbG4sXXngBY8eOBQAsX74c3t7eeO+998x6bYjKRMY7c4ieCtpbdM+cOSN69+4t3N3dRZUqVcQ777wjHj9+XKT+N998I1q3bi3c3NyEm5ubqFOnjhg9erRISkoSQghx6dIlMWLECFGrVi3h4uIivL29RYcOHcTevXv15nPu3DnRtm1b4erqKgAYdbtuZmamrv7XX39dZHx2drZ49913RWBgoHB1dRWtWrUShw8fFu3atRPt2rXT1bP0Lbrt2rUr9rbauLg4k5ZhLrVaLRYsWCBCQkKEs7OzqF+/vsHXaujQoQZv+1Wr1SIoKEg8//zzxS5j3bp1Jd5KXPB1XbZsmYiMjBTe3t7C0dFRBAYGikGDBokLFy4YnPeJEydEVFSUcHNzE+7u7qJ79+7i/PnzZXotiMylEKJQ3yoRERGRDfCaECIiIpIFrwkhIqsy5hZZT09Pk56xQUT2gSGEiKzKmFtk161bV+wX7BGR/eI1IURkVffu3cOJEydKrFO/fn0EBgbaqEVEVF4whBAREZEseGEqERERyYLXhBig0Whw8+ZNuLu7m/zoayIioqeZEAIPHjxAtWrVdE/sLQ5DiAE3b97U+5ZJIiIiMs21a9dK/UJNhhAD3N3dAUgvoIeHh8ytISIiqjgyMzMRHBys25eWhCHEAO0pGA8PD4YQIiKiMjDmcgZemEpERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVv0SUielqo1cBvvwEpKUBgINCmDaBUVtzlUIXHEEJEFQt3cGUTEwOMGwdcv55fVr06sGwZ0KtXxVsO2QV+i64BmZmZ8PT0REZGBh9WRlSe2OMOzhahKiYG6N0bKPxxr32Y1Pbtlnn9bLWcghhKyx1T9qEMIQYwhBCZyJ52pFq2Widrhyq1GggJAW7cKL6Ovz+wcyfg6go4OxsenJzyX+vilhMaqr8uBSkU0rolJ1vudbRlKLVl2Kngp80YQszEEPKUquBvfNnYakdqyx2cLdbJnFAlBHD/PpCWBqSnS4Oh39PSgJs3gUePLNPm4gKKszOQmwtcvFj6PBYtAl54AahaFfD1BdzcytYWW4ZSW4YdOzhtxhBiJoaQcsZejkhtuRwta7925uwIHj0C7tzRH+7eNVx27VrxAaSgLl2A558HAgKkITAw/3dXV+uvk7FKC1UA4O0NTJkC3LpVNGSkpwN5eea1oTAvL8DRUQoTOTnST1vsHlxdpTCiDSWl/fT2lqazVSi1ddixg9NmDCFmYggpR8r7EWl5XE7B5VnztTNmR+ruDvTrB9y7VzRYZGeb3wZTeHgUDSaF/65aFWje3Lidm4MD8Pgx8OBB0SEzs+Sya9eA06fNXydPT8DPTxr8/Yv+7u8vtXXo0NLnFRcHtG+f/7cQ0v84N9e44fhx4F//Kn05tWpJIefWLemnqRQKabvKzCy97uDB0vIcHaUwUnAwVFa4HABef13aZovj7w/s2CGdrnJwkKYr+NPY3wGgXr3iT5spFNK2evgwoNEAT55IQ16eab/n5ACTJ0vvyeKWY2aAYwgxU4UOIfZ0SqE8HJFa6ohKjtMJprx2Go30oX7vnhQQtD8L/l647OZN4PZt89rp5AT4+EhHtz4+RQdt+dWrwPjxpc9v2DDpyDo1NX9ISbF84HFzk+apVlt2voW1aiWFosLBws9PCkwuLqXPQ7vt3bhhuGfD0tu4scsRAsjKksLI7dvG/Sxux0mWVziUmoAhxEwVNoTY0ykFY3bagYHAvn3SUVh2tjQ8flz095LKrl6V3mylCQ+Xjr4KLr80BetkZgLnz5c+TbNm0k7G0TF/0B6dFTcUHu/gAERHAxkZxS/HxQWIiJA+1LWDRlN6+8qid2/pw8xQ2Khc2bjX0pwdqRDS618wlBQOKdrf09PLto6VK0s9Le7uRQdD5VevAnPmlD5fM3YEerShFNB//azV22et5Tx5IvVK/PQTMGJE6fV79JB6uJ48kbahgoOhssLlaWnGXefi7S29pzQaaTqNpvjftT/LysEh/0Jh7XvelN/T04FTp0pfzsaNwGuvlamJDCFmsngIsZdeA0svR6ORjqoLnufWDgkJwA8/mN9eMk2lSkCVKtKHauGfhcuSk4E33yx9nhVpRxobC0RFlV7vyy+leu7uUq+ItjvdWLbqnSjI0MFDcDCwdKn1D1IsvRxbvX779gEdOpReryzbuPZ0lzaY7NsHdO5snWUVZM11+htDiJksGkLKS6+BrU4pVKsG/PqrdLRiKFwUHG7fNv/oW6WSjkJdXaUjEe3Pgr+XVHb9OvDJJ6UvZ9EioEED6Xdj3jKF6/zxBzBtWunTTZ0q9bpoz99qB+0RmqGh8Ljz56XQW5p335WOFAsGDJWq9Om07HFHast1slXvREH2eLoWsN7rZ8vtobyeNisDhhAzWSyEWKrXIDe35LsHEhOBn38ufT4BAdIOu/BFUaVdNKX9ef++dPGZpXl7519Qpz3X/eiRdLRZGnOPCuzoja9jgyMdHXvckdpynWzVO2GvbPH62Xp7sIPTZgwhZrJICDHmzgFfX2Dx4vwL/gqHC+3w8GHZ2iAXlUrqESkYLAoGjIJ/+/pK5yoLs8cjUlstx9Y9FPa4I7XlOtnbc2NsTa5b+K21PdjBaTOGEDNZJIQYezRqLAcHqbvc0B0EGRnA2rWlz2PlSqBRo9IvmCru98REYOHC0pdTka4BKLisCv7GL7IcW/ZQ2OOO1B7XicqOT0w1GkOImSwSQjZtAgYMKL1e/frSUNytidrBy6v4i9/s8ZSClj0ekdpqOfbYQ0FE5R5DiJls2hNS0XoN7PEaAHvG146IbIwhxEwWvSbEHnsNeIRNRETFYAgxk8XvjgHsr9eAR9hERGQAQ4iZrP6cEPYaEBGRnTJlH+poozY9vXr1Arp3Z68BERFRIQwhtqBUWubiUyIiIjti4hceEBEREVkGQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWZSLELJy5UqEhobCxcUFzZs3x7Fjx4qt2759eygUiiJDly5dAAB5eXmYMmUKIiIi4ObmhmrVqmHIkCG4efOmrVaHiIiIjCB7CNmyZQsmTpyIWbNmIT4+Hg0bNkTHjh2Rnp5usH5MTAxSUlJ0Q2JiIpRKJfr06QMAePToEeLj4zFjxgzEx8cjJiYGSUlJ6Natmy1Xi4iIiEqhEEIIORvQvHlzNGvWDCtWrAAAaDQaBAcHY8yYMZg6dWqp0y9duhQzZ85ESkoK3NzcDNY5fvw4IiMjceXKFdSoUaPUeWZmZsLT0xMZGRnw8PAwbYWIiIieYqbsQ2XtCcnNzcWJEycQFRWlK3NwcEBUVBQOHz5s1DzWrl2L/v37FxtAACAjIwMKhQJeXl4Gx+fk5CAzM1NvICIiIuuSNYTcvn0barUa/v7+euX+/v5ITU0tdfpjx44hMTERI0eOLLZOdnY2pkyZgtdee63YRBYdHQ1PT0/dEBwcbNqKEBERkclkvybEHGvXrkVERAQiIyMNjs/Ly0Pfvn0hhMCnn35a7HymTZuGjIwM3XDt2jVrNZmIiIj+5ijnwn19faFUKpGWlqZXnpaWhoCAgBKnzcrKwubNmzF37lyD47UB5MqVK/j1119LPC+lUqmgUqlMXwEiIiIqM1l7QpydndGkSRPExsbqyjQaDWJjY9GiRYsSp922bRtycnIwaNCgIuO0AeTChQvYu3cvfHx8LN52IiIiMo+sPSEAMHHiRAwdOhRNmzZFZGQkli5diqysLAwfPhwAMGTIEAQFBSE6OlpvurVr16JHjx5FAkZeXh569+6N+Ph4/Pjjj1Cr1brrS7y9veHs7GybFSMiIqISyR5C+vXrh1u3bmHmzJlITU1Fo0aNsGvXLt3FqlevXoWDg36HTVJSEg4cOIBffvmlyPxu3LiBHTt2AAAaNWqkNy4uLg7t27e3ynoQERGRaWR/Tkh5xOeEEBERlU2FeU4IERERPb0YQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIISIiIlkwhBAREZEsGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIISIiIlkwhBAREZEsGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWZSLELJy5UqEhobCxcUFzZs3x7Fjx4qt2759eygUiiJDly5ddHViYmLw8ssvw8fHBwqFAgkJCTZYCyIiIjKF7CFky5YtmDhxImbNmoX4+Hg0bNgQHTt2RHp6usH6MTExSElJ0Q2JiYlQKpXo06ePrk5WVhZat26NRYsW2Wo1iIiIyEQKIYSQswHNmzdHs2bNsGLFCgCARqNBcHAwxowZg6lTp5Y6/dKlSzFz5kykpKTAzc1Nb9zly5cRFhaGkydPolGjRka3KTMzE56ensjIyICHh4dJ60NERPQ0M2UfKmtPSG5uLk6cOIGoqChdmYODA6KionD48GGj5rF27Vr079+/SAAxRU5ODjIzM/UGIiIisi5ZQ8jt27ehVqvh7++vV+7v74/U1NRSpz927BgSExMxcuRIs9oRHR0NT09P3RAcHGzW/IiIiKh0sl8TYo61a9ciIiICkZGRZs1n2rRpyMjI0A3Xrl2zUAuJiIioOI5yLtzX1xdKpRJpaWl65WlpaQgICChx2qysLGzevBlz5841ux0qlQoqlcrs+RAREZHxZO0JcXZ2RpMmTRAbG6sr02g0iI2NRYsWLUqcdtu2bcjJycGgQYOs3UwiIiKyAll7QgBg4sSJGDp0KJo2bYrIyEgsXboUWVlZGD58OABgyJAhCAoKQnR0tN50a9euRY8ePeDj41Nknnfv3sXVq1dx8+ZNAEBSUhIAICAgoNQeFiIiIrIN2UNIv379cOvWLcycOROpqalo1KgRdu3apbtY9erVq3Bw0O+wSUpKwoEDB/DLL78YnOeOHTt0IQYA+vfvDwCYNWsWZs+ebZ0VISIiIpPI/pyQ8ojPCSEiIiqbCvOcECIiInp6MYQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIISIiIlk4yt0AIiKikggh8OTJE6jVarmbQgCUSiUcHR2hUCjMnhdDCBERlVu5ublISUnBo0eP5G4KFVCpUiUEBgbC2dnZrPkwhBARUbmk0WiQnJwMpVKJatWqwdnZ2SJH31R2Qgjk5ubi1q1bSE5ORnh4OBwcyn5lB0MIERGVS7m5udBoNAgODkalSpXkbg79zdXVFU5OTrhy5Qpyc3Ph4uJS5nnxwlQiIirXzDnSJuuw1P+E/1kiIiKSBUMIERERyYIhhIiI7JpaDezbB2zaJP2saHf6hoaGYunSpRab38GDBxEREQEnJyf06NHDYvMtC16YSkREdismBhg3Drh+Pb+senVg2TKgVy/rLbd9+/Zo1KiRRcLD8ePH4ebmZn6j/jZx4kQ0atQIP//8MypXrgwAGDt2LA4ePIjExETUrVsXCQkJFlteSdgTQkREdikmBujdWz+AAMCNG1J5TIw87QLyH8BmjKpVq1r07qC//voLL7zwAqpXrw4vLy9d+YgRI9CvXz+LLccYDCFERFShZGUVP2RnS3XUaqkHRIii02vLxo3TPzVT3DxNNWzYMOzfvx/Lli2DQqGAQqHAF198AYVCgZ9//hlNmjSBSqXCgQMH8Ndff6F79+7w9/dH5cqV0axZM+zdu1dvfoVPxygUCvz3v/9Fz549UalSJYSHh2PHjh2ltuvy5ctQKBS4c+cORowYoWsXACxfvhyjR49GzZo1TV9hMzCEEBFRhVK5cvHDq69KdX77rWgPSEFCSON/+y2/LDTU8DxNtWzZMrRo0QKjRo1CSkoKUlJSEBwcDACYOnUqFi5ciLNnz6JBgwZ4+PAhOnfujNjYWJw8eRKdOnVC165dcfXq1RKXMWfOHPTt2xd//PEHOnfujIEDB+Lu3bslThMcHIyUlBR4eHhg6dKlSElJsXnPR2EMIUREZHdSUixbzxSenp5wdnZGpUqVEBAQgICAACiVSgDA3Llz8dJLL6FWrVrw9vZGw4YN8cYbb+C5555DeHg45s2bh1q1apXaszFs2DC89tprqF27NhYsWICHDx/i2LFjJU6jVCoREBAAhUIBT09PBAQEwNXV1WLrXRa8MJWIiCqUhw+LH/f3vh6BgcbNq2C9y5fL3CSjNW3aVO/vhw8fYvbs2di5cydSUlLw5MkTPH78uNSekAYNGuh+d3Nzg4eHB9LT063SZmtiCCEiogrFmBtF2rSR7oK5ccPwdSEKhTS+TRvT5muuwne5TJo0CXv27MG///1v1K5dG66urujduzdyc3NLnI+Tk5Pe3wqFAhqNxuLttTaGECIisjtKpXQbbu/eUuAoGES034G3dGl+z4mlOTs7Q23EA0kOHjyIYcOGoWfPngCknpHLtuiSKSd4TQgREdmlXr2A7duBoCD98urVpXJrPickNDQUR48exeXLl3H79u1ieynCw8MRExODhIQEnDp1CgMGDJClR+PixYtISEhAamoqHj9+jISEBCQkJJTaI2Mu9oQQEZHd6tUL6N5dugsmJUW6BqRNG+v1gGhNmjQJQ4cORb169fD48WOsW7fOYL0lS5ZgxIgRaNmyJXx9fTFlyhRkZmZat3EGjBw5Evv379f93bhxYwBAcnIyQkNDrbZchRCGzpY93TIzM+Hp6YmMjAx4eHjI3RwioqdSdnY2kpOTERYWZtbXxZPllfS/MWUfytMxREREJAuLhZBr165hxIgRlpodERERmejNN99E5cqVDQ5vvvmm3M0rwmKnY06dOoXnn3/eqKuByzuejiEikh9Px5guPT292GtKPDw84OfnZ5HlWOp0jNEXppb29LZLly4ZOysiIiKyAj8/P4sFDVswOoT06NEDCoUCJXWcKLQ3XxMRERGVwuhrQgIDAxETEwONRmNwiI+Pt2Y7iYiIyM4YHUKaNGmCEydOFDu+tF4SIiIiooKMOh3zxx9/YPLkycjKyiq2Tu3atREXF2exhhEREZF9MyqENG7cGCkpKfDz80PNmjVx/Phx+Pj46NVxc3NDu3btrNJIIiIisj9GnY7x8vJCcnIyAODy5csV8pv6iIjoKaVWA/v2AZs2ST8rwKMkQkNDsXTpUqPqpqam4qWXXoKbmxu8vLys2i5LM6on5NVXX0W7du0QGBgIhUKBpk2bQlnMg/d5qy4REZUbMTHAuHHA9ev5ZdWrS1+xa81vsLOhjz/+GCkpKUhISICnpycAYM2aNdi4cSPi4+Px4MED3Lt3r1wGFKNCyJo1a9CrVy9cvHgRY8eOxahRo+Du7m7tthEREZVdTAzQuzdQ+KaJGzekcmt/la6N/PXXX2jSpAnCw8N1ZY8ePUKnTp3QqVMnTJs2TcbWlUKYaNiwYSIzM9PUyUq0YsUKERISIlQqlYiMjBRHjx4ttm67du0EgCJD586ddXU0Go2YMWOGCAgIEC4uLuLFF18U58+fN7o9GRkZAoDIyMgwa72IiKjsHj9+LM6cOSMeP34sFWg0Qjx8aNyQkSFEUJAQUgQpOigUQlSvLtUzZn4ajdHt/uyzz0RgYKBQq9V65d26dRPDhw8XFy9eFN26dRN+fn7Czc1NNG3aVOzZs0evbkhIiPj4449LXVZISIjevnDo0KF64+Pi4gQAce/ePaPbb4wi/5sCTNmHmvzdMevWrbNoL8iWLVswceJEzJo1C/Hx8WjYsCE6duyI9PR0g/VjYmKQkpKiGxITE6FUKtGnTx9dnQ8//BDLly/H6tWrcfToUbi5uaFjx47Izs62WLuJiMjGHj0CKlc2bvD0lHo8iiOEdIrG09O4+T16ZHQz+/Tpgzt37ujdMXr37l3s2rULAwcOxMOHD9G5c2fExsbi5MmT6NSpE7p27YqrV6+a/JIcP34cnTp1Qt++fZGSkoJly5aZPA85yf4tukuWLMGoUaMwfPhw1KtXD6tXr0alSpXw+eefG6zv7e2NgIAA3bBnzx5UqlRJF0KEEFi6dCnef/99dO/eHQ0aNMBXX32Fmzdv4rvvvrPhmhER0dOoSpUqeOWVV7Bx40Zd2fbt2+Hr64sOHTqgYcOGeOONN/Dcc88hPDwc8+bNQ61atUr9ehRDqlatCpVKBVdXVwQEBOiuCakoZA0hubm5OHHiBKKionRlDg4OiIqKwuHDh42ax9q1a9G/f3+4ubkBAJKTk5Gamqo3T09PTzRv3rzYeebk5CAzM1NvICKicqZSJeDhQ+OGn34ybp4//WTc/CpVMqmpAwcOxDfffIOcnBwAwIYNG9C/f384ODjg4cOHmDRpEurWrQsvLy9UrlwZZ8+eLVNPSEVn9HfHWMPt27ehVqvh7++vV+7v749z586VOv2xY8eQmJiItWvX6spSU1N18yg8T+24wqKjozFnzhxTm09ERLakUAB/H3CW6uWXpbtgbtwoemGqdl7Vq0v1irnb0xxdu3aFEAI7d+5Es2bN8Ntvv+Hjjz8GAEyaNAl79uzBv//9b9SuXRuurq7o3bs3cnNzLd6O8k720zHmWLt2LSIiIhAZGWnWfKZNm4aMjAzdcO3aNQu1kIiIZKFUSrfhAlLgKEj799KlVgkgAODi4oJevXphw4YN2LRpE5599lk8//zzAICDBw9i2LBh6NmzJyIiIhAQEIDLly9bpR3lnawhxNfXF0qlEmlpaXrlaWlpCAgIKHHarKwsbN68Gf/3f/+nV66dzpR5qlQqeHh46A1ERFTB9eol3YYbFKRfXr26TW7PHThwIHbu3InPP/8cAwcO1JWHh4cjJiYGCQkJOHXqFAYMGGDxh4CmpqYiISEBFy9eBACcPn0aCQkJuHv3rkWXYy5ZQ4izszOaNGmC2NhYXZlGo0FsbCxatGhR4rTbtm1DTk4OBg0apFceFhaGgIAAvXlmZmbi6NGjpc6TiIjsTK9ewOXLQFwcsHGj9DM52SbPB3nhhRfg7e2NpKQkDBgwQFe+ZMkSVKlSBS1btkTXrl3RsWNHXS+JpaxevRqNGzfGqFGjAABt27ZF48aNy3TxqzUphJD3q2+3bNmCoUOH4rPPPkNkZCSWLl2KrVu34ty5c/D398eQIUMQFBSE6OhovenatGmDoKAgbN68ucg8Fy1ahIULF+LLL79EWFgYZsyYgT/++ANnzpyBi4tLqW3KzMyEp6cnMjIy2CtCRCST7OxsJCcnIywszKjPbrKdkv43puxDZb0wFQD69euHW7duYebMmUhNTUWjRo2wa9cu3YWlV69ehYODfodNUlISDhw4gF9++cXgPP/1r38hKysLr7/+Ou7fv4/WrVtj165d3IiJiIjKEdl7Qsoj9oQQEcmPPSHSrb1vvPGGwXEhISH4888/bdwiid30hBAREZFh3bp1Q/PmzQ2Oc3JysnFrLI8hhIiIqJxyd3e36y+MrdDPCSEiIvvHqwbKH0v9TxhCiIioXNKebnhkwpfHkW1o/yfmnhLi6RgiIiqXlEolvLy8dN+qXqlSJSgKP/2UbEoIgUePHiE9PR1eXl5QmvnEWYYQIiIqt7RPutYGESofvLy8Sn2yuTEYQoiIqNxSKBQIDAyEn58f8vLy5G4OQToFY24PiBZDCBERlXtKpdJiOz4qP3hhKhEREcmCIYSIiIhkwRBCREREsmAIISIiIlkwhBAREZEsGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIISIiIlkwhBAREZEsGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKShewhZOXKlQgNDYWLiwuaN2+OY8eOlVj//v37GD16NAIDA6FSqfDMM8/gp59+0o1/8OABxo8fj5CQELi6uqJly5Y4fvy4tVeDiIiITCRrCNmyZQsmTpyIWbNmIT4+Hg0bNkTHjh2Rnp5usH5ubi5eeuklXL58Gdu3b0dSUhL+85//ICgoSFdn5MiR2LNnD9avX4/Tp0/j5ZdfRlRUFG7cuGGr1SIiIiIjKIQQQq6FN2/eHM2aNcOKFSsAABqNBsHBwRgzZgymTp1apP7q1avx0Ucf4dy5c3Bycioy/vHjx3B3d8f333+PLl266MqbNGmCV155BR988IFR7crMzISnpycyMjLg4eFRxrUjIiJ6+piyD5WtJyQ3NxcnTpxAVFRUfmMcHBAVFYXDhw8bnGbHjh1o0aIFRo8eDX9/fzz33HNYsGAB1Go1AODJkydQq9VwcXHRm87V1RUHDhwoti05OTnIzMzUG4iIiMi6ZAsht2/fhlqthr+/v165v78/UlNTDU5z6dIlbN++HWq1Gj/99BNmzJiBxYsX63o43N3d0aJFC8ybNw83b96EWq3G119/jcOHDyMlJaXYtkRHR8PT01M3BAcHW25FiYiIyCDZL0w1hUajgZ+fH9asWYMmTZqgX79+mD59OlavXq2rs379egghEBQUBJVKheXLl+O1116Dg0Pxqzpt2jRkZGTohmvXrtlidYiIiJ5qjnIt2NfXF0qlEmlpaXrlaWlpCAgIMDhNYGAgnJycoFQqdWV169ZFamoqcnNz4ezsjFq1amH//v3IyspCZmYmAgMD0a9fP9SsWbPYtqhUKqhUKsusGBERERlFtp4QZ2dnNGnSBLGxsboyjUaD2NhYtGjRwuA0rVq1wsWLF6HRaHRl58+fR2BgIJydnfXqurm5ITAwEPfu3cPu3bvRvXt366wIERERlYmsp2MmTpyI//znP/jyyy9x9uxZvPXWW8jKysLw4cMBAEOGDMG0adN09d966y3cvXsX48aNw/nz57Fz504sWLAAo0eP1tXZvXs3du3aheTkZOzZswcdOnRAnTp1dPMkIiKi8kG20zEA0K9fP9y6dQszZ85EamoqGjVqhF27dukuVr169aretRzBwcHYvXs3JkyYgAYNGiAoKAjjxo3DlClTdHUyMjIwbdo0XL9+Hd7e3nj11Vcxf/58g7f0EhERkXxkfU5IecXnhBAREZVNhXhOCBERET3dGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGTBEEJERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIISIiIlkwhBAREZEsGEKIiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiIiISBYMIURERCQLhhAiIiKSBUMIERERyYIhhIiIiGThKHcDiIjIvqjVwG+/ASkpQGAg0KYNoFTK3SoqjxhCqEz4IUP2jtt42cTEAOPGAdev55dVrw4sWwb06iVfu8zF7cE6GELIZPb6IUNlZ28f0NzGyyYmBujdGxBCv/zGDal8+/aK+frZenuwt/dTiQQVkZGRIQCIjIwMuZtS7nzzjRAKhRDSx0z+oFBIwzffyN1CsrVvvhGienX97aF69Yq7Ldh6G3/yRIi4OCE2bpR+Pnli2fnbypMnRbeDwq9fcHDFWz9bbw/28H4yZR+qEKJwZqXMzEx4enoiIyMDHh4ecjen3FCrgdBQ/aOBghQK6eggOdmOU3sFY+0jquKOfBUK6WdFO/K19TZuTz0u+/YBHTqUXu/f/wZ69ABq1ACcnMxfrjW38SdPgJAQ4OZNw+MVCsDHB9i2DfD0BCpXlgY3N2kwtR328n4yZR/KEGIAQ4hhxn7IxMUB7dtbuzVUGmvv4OwxlJqyI3V0lHY8BQcPD+mnry/gUMq9h/aywwGAixeBWbOAjRuNn8bBQdo+pk8HXn9dKnvwADh1CggLkwKFMa9hWbdxIYD794Fr16TtWPtRv2OHNP21a8CVK0BurvHrVNjWrUCfPtLve/cCM2fqBxXt75UrA927A1272sf7yZR9KK8JIaOlpJhW7/Bh4NAhoH594LnngKCg/A9YUzxV50ctxFLn5h8/ll73lBTpaDAiAqhTRxq3Zk3xH5iAtOxr16T/XUUJpadPG1fvp5+AX38tfvylS9KOFJACy+bN+mGlcmXgyy+L/n8AqUyhAMaPl3ZM5XFbFwKIjwe++04aEhONnzY4GLh1C8jOBq5e1R938iTQrp30u0ol9UKEheUPnToBDRpI40vbxrdtA159NX++334rba/XrknD9etAVpY0ftcuoGNH6fd790r+3xoSECAFpocPpUGjkcorVdJv1+HDxc/jyRPj3k//+59xQbmiYAghowUGmlbvhx+A6Oj8cg+P/EBSvz4wcKB0xFgSW3ZX20vYUaul16ykHdy4cdL/IS1N6hYPCZHGnzoFvPuuFDhSUqQjxYIWLcoPIYXHFcfY8Cq31auBsWONq/v880DVqkBGhjRkZub//uCBFDS0Ll4ETpwwrS3aHc6qVcDIkYCrq2nTW4N22wGAvDzghRek9QakXqH27YHff5deA0PbXsEjeYVC2vaSk/PDGiCF3tBQad1zcoDz56VBy8NDCiFqNfDWW8Vv44A0vkcP6T18+jQwb57h9fL1zQ8jgBSC1q+XwtKNG9LnVGk2bcoP2kJIAevhQ8DdPb/OCy9IQUgbVLRDVpb009j/ce/eUg/Liy8aV7+84+kYA3g6xjBTu9+3bJFCRGKi9EHy5Il+/eRkaX6AdFR47JgUTrRBZf9+23VXP43n5rUWLgSmTJF+T0gAGjfWH+/iIoWyatWA//s/YPhwqfznn4HOnUuff1wckJ4O7N4NDB4MtG1beje7Lfz5p7SdakNVYqLU06NSSTtAQ4zpElerpfXTbqfnzgF//aUfVg4dkrr9jeHoKO14IyOBBQuAKlVMW8/CbTMlaGdlAb/8Iu08k5KAI0fy12vUKKnXoGdPaTuoUiW/dwLQf9+a+p598kQKIsnJ0nD5svRzzBigeXPTTw2fOgV8+qkULIKDpf+h9mdJO3/tZ96NG6UHK3MPWEx53545A9StK/3+5ZdSr0+bNkDr1kDTptI2bAxrHXiZtA+16iWyFRTvjsmnVguxYoUQKSnS39orxQtfLV7aleI5OUKcPi3Epk1CvP++EH36SPPW6t276NXnDg4lX2lfrZoQd+/qz6cs7OGOn3v3hNizR4j584Vo0qT4163g4OwsRO3aQixblj+fzEwhvvpKiL17hfjzT2m+Go3hZWrvhjD02hW+G+LFF/PLa9QQ4r33hDh71havjL6MDCE++0yIyEipLQMG6I+/erXs27gp4uKM+x9VqZL/u0olvY+0Pv5Yeh2//z7//VkSY++6uHVLiHXrhOjeXQhXV/36p0+XbTnBwZZ9H23caNzrt3Gj+cuyxfYghHHvp+rVpfd5wfdk//769VxchGjbVto2fv5ZiOzs4tfLWnfhmLIPlT2ErFixQoSEhAiVSiUiIyPF0aNHS6x/79498fbbb4uAgADh7OwswsPDxc6dO3Xjnzx5It5//30RGhoqXFxcRM2aNcXcuXOFprhPUgMYQiTXrwsRFSVtnK+8kr/hW+ND5ocfhJgyRYguXYQIDTXuA6bg4O4uhZI6dfQ/qNesEWLMGOkNuXChEKtWCbF+vfTB/euv0hvU1rcVWuKWzIwMIZYvF2LQICGeecb01wuQ1t9cxn5A/+9/QowcKYSHh369pk2F+OST4oOOJWg00vKHDhWiUqX8ZTs6SiHE0LKtvSM1NsDl5Qlx5YoQ27bph0UhhIiI0J+mRg0p3P/730IcOFB0fYwJ2h99VDT8h4YKMWGCEPv3G7+tWvu2Y2NDXFycZZZni2ClXY6pgefUKSGWLBGiVy8hqlbVn87BQTqwKFj35k3rH3hVmBCyefNm4ezsLD7//HPx559/ilGjRgkvLy+RlpZmsH5OTo5o2rSp6Ny5szhw4IBITk4W+/btEwkJCbo68+fPFz4+PuLHH38UycnJYtu2baJy5cpiWeF3cAkYQoTYsiX/KMzVVYhPP9X/sLb2h8znn5dtx+rkpN/O7t1Lrr9rl3HznT5diEePzF8vU48+njyRjj4//1yI7dvzyzMyin6I1KwpHRV99JEQfn7G9VBYgikf0I8eSdvWP/8phQBAiPbt9evk5ZW8PFO3vS5d9NtWp470GqWmWnY5pjL3CPs//xFixAgh6tcvOo9nntFfD19f44L2Tz9JZY0aCTF7thAJCdYNiGVlSi+cJZdpi+e5mBN4NBohzp0T4r//lUJ3z5764//xD2l+SqVx20NZVZgQEhkZKUaPHq37W61Wi2rVqono6GiD9T/99FNRs2ZNkZubW+w8u3TpIkaMGKFX1qtXLzFw4MBip8nOzhYZGRm64dq1a09tCLl/X4jBg/WPVM+ds307jD3S2bVLiLQ0IS5cEOLECSF++01/Pps3SwFi7Fghhg2TjhaiooRo3lyI554TYsMG4wPOgwf58508WTrt0aOHEO+8I/WybNggHS3+9ZfhU0SlHX1s3y7E5ctCbN0qxKRJQrRrJ4SbW3691q315/fmm0LMmSN1ud66ZXhZ1u5C1irLB3R6utSb88MP+WUpKUJ4e0u9Jvv3F30dSwtxeXlC7NihHxgXLZJexxEjhDh4sHztVC11hJ2ZKfVsRUdLO54JE/LH7d1r3PYdFyf1Il66ZNFVtBpbb+O2ZI3Ao1YL0ayZ8Z935vQiVYgQkpOTI5RKpfj222/1yocMGSK6detmcJpXXnlFDBw4UIwaNUr4+fmJ+vXri/nz54snBf5D8+fPFyEhISIpKUkIIURCQoLw8/MTX3/9dbFtmTVrlgBQZHjaQsiZM0KEhEgboIODdO1GCXnPqmx1pGNs2GneXH+6Tp1Krv/wYX7dlSuFeOstITw9Sz/68PMrOs7NTQokM2eatm626kK2pFWr9NsbEiKFyHPnSg9xPXsKERgolW3YkD/PzEz9LunyxtpH2KtXG7eNW+L6CVuriNu43P77X+tvDxUihNy4cUMAEIcOHdIrnzx5soiMjDQ4zbPPPitUKpUYMWKE+P3338XmzZuFt7e3mD17tq6OWq0WU6ZMEQqFQjg6OgqFQiEWLFhQYlvYEyJ58EC6UDEsrOg5ZTnY4kinrGHnzBnpiHvVKiGmTZN6jzp0kF6/atX063bsaPzRR5s2Ug/Lm29Kp2ASE83bKVW0R4Kr1VI7R4woev2Ik5Nxr2HVqtK1QCSx9fUTtlbRtnG52WJ7sNsQEh4eLoKDg/V6PhYvXiwCAgJ0f2/atElUr15dbNq0Sfzxxx/iq6++Et7e3uKLL74wum1P0zUhhU8dnDsnXW9QXtjiSMfSYadwd/+WLaVfm6IdCh7BP+0ePZJOp3XpUvKdUgWHOXP0L0wmea6foPLLFtuDKftQ2e7W9/X1hVKpRFpaml55WloaAgICDE4TGBiIZ555BsoCNzLXrVsXqampyP372bqTJ0/G1KlT0b9/f0RERGDw4MGYMGECogs+NYsgBLBihfRMjk8+yS9/9tn8xxeXB716Sc8IiIuTHgkdFyfdk2/JZ3f06iU9vyAoSL+8evWyPYuk8FNh+/aVnn5pjGrVTFuWPXN1Bfr1A378EVi50rhpwsMBZ2frtquiUSql590ARbdN7d9Ll1bMB/OR6crb9iBbCHF2dkaTJk0QGxurK9NoNIiNjUWLFi0MTtOqVStcvHgRGu0zcQGcP38egYGBcP77k+fRo0dwKPQkJKVSqTfN0+7mTeCVV6QH/2RnSzt2IeRuVfGUSumBQ6+9Jv20xpvD2mGnTRsp1BT32HqFQnp4Ups2llmevdE+UKw0xj7V92lj6aBNFVu52h7K3uFivs2bNwuVSiW++OILcebMGfH6668LLy8vkfr3vXODBw8WU6dO1dW/evWqcHd3F++8845ISkoSP/74o/Dz8xMffPCBrs7QoUNFUFCQ7hbdmJgY4evrK/71r38Z3S5Ln44pT+csv/lGuvsAkB5qY+1nNFA+e76a39p4SsEyytNnEcnPWttDhbgmROuTTz4RNWrUEM7OziIyMlIcOXJEN65du3Zi6NChevUPHTokmjdvLlQqlahZs2aRu2MyMzPFuHHjRI0aNXQPK5s+fbrIMeFEsSVDiDWfSmeKjAzpFlVtGxo1kp6ISbbFq/nLjiGOqGIwZR+qEKI8d8TLw1LfHSPHV3UX910AJ09K3z2hVkvfEzJnDs+dy8VevihPDoa+4yc4WDqHzVMKROWDKftQhhADLBFCSvuyNwDw9weOH5e+BtrJqWxtLai0L2H77DPpS4/atjV/WURyYYgjKt8YQsxkiRBi6jeZVqkCpKbm906sWyd986afn/SV4QV/+vpK36xZkBy9LkRERIWZsg91LHEslVlKinH1FAopOKjV+qdHtm4Fdu0qfprs7Pz6S5cC779v+A4XIaT648cD3bvziJGIiMoPhhArMfZWwb17gYYNgXv39MtffRWoVQtITwdu3cr/eecO4O6uH1g2bgSysopfhhDAtWtSF3b79iavChERkVUwhFiJ9rkQN24Y7qFQKKTx7dpJvRM+PvrjR440PF+1GsjI0C9r0UK6tqQ0xvbOEBER2YJsDyuzd9Z6Kp1SCXh765f17GnctHyQExERlScMIVZkq6fS8WmcRERUEfF0jJX16iVdEGrNWwq1vS69e+df6KrF74YgIqLyiiHEBrTffWJN2l4XQ88J4YOciIioPGIIsSO26HUhIiKyFIYQO2OLXhciIiJL4IWpREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZ8BZdA8TfjxzNzMyUuSVEREQVi3bfKQx9e2shDCEGPHjwAAAQHBwsc0uIiIgqpgcPHsDT07PEOgphTFR5ymg0Gty8eRPu7u5QFPetcOVYZmYmgoODce3aNXh4eMjdHIuwt3Wyt/UBuE4VBdepYqjI6ySEwIMHD1CtWjU4OJR81Qd7QgxwcHBA9erV5W6G2Tw8PCrcxlsae1sne1sfgOtUUXCdKoaKuk6l9YBo8cJUIiIikgVDCBEREcmCIcQOqVQqzJo1CyqVSu6mWIy9rZO9rQ/AdaoouE4Vgz2ukyG8MJWIiIhkwZ4QIiIikgVDCBEREcmCIYSIiIhkwRBCREREsmAIsRPR0dFo1qwZ3N3d4efnhx49eiApKUnuZlnUwoULoVAoMH78eLmbYpYbN25g0KBB8PHxgaurKyIiIvD777/L3awyU6vVmDFjBsLCwuDq6opatWph3rx5Rn1vRHnxv//9D127dkW1atWgUCjw3Xff6Y0XQmDmzJkIDAyEq6sroqKicOHCBXkaa6SS1ikvLw9TpkxBREQE3NzcUK1aNQwZMgQ3b96Ur8FGKO3/VNCbb74JhUKBpUuX2qx9pjJmfc6ePYtu3brB09MTbm5uaNasGa5evWr7xloJQ4id2L9/P0aPHo0jR45gz549yMvLw8svv4ysrCy5m2YRx48fx2effYYGDRrI3RSz3Lt3D61atYKTkxN+/vlnnDlzBosXL0aVKlXkblqZLVq0CJ9++ilWrFiBs2fPYtGiRfjwww/xySefyN00o2VlZaFhw4ZYuXKlwfEffvghli9fjtWrV+Po0aNwc3NDx44dkZ2dbeOWGq+kdXr06BHi4+MxY8YMxMfHIyYmBklJSejWrZsMLTVeaf8nrW+//RZHjhxBtWrVbNSysiltff766y+0bt0aderUwb59+/DHH39gxowZcHFxsXFLrUiQXUpPTxcAxP79++VuitkePHggwsPDxZ49e0S7du3EuHHj5G5SmU2ZMkW0bt1a7mZYVJcuXcSIESP0ynr16iUGDhwoU4vMA0B8++23ur81Go0ICAgQH330ka7s/v37QqVSiU2bNsnQQtMVXidDjh07JgCIK1eu2KZRZipuna5fvy6CgoJEYmKiCAkJER9//LHN21YWhtanX79+YtCgQfI0yEbYE2KnMjIyAADe3t4yt8R8o0ePRpcuXRAVFSV3U8y2Y8cONG3aFH369IGfnx8aN26M//znP3I3yywtW7ZEbGwszp8/DwA4deoUDhw4gFdeeUXmlllGcnIyUlNT9bY/T09PNG/eHIcPH5axZZaVkZEBhUIBLy8vuZtSZhqNBoMHD8bkyZNRv359uZtjFo1Gg507d+KZZ55Bx44d4efnh+bNm5d4CqoiYgixQxqNBuPHj0erVq3w3HPPyd0cs2zevBnx8fGIjo6WuykWcenSJXz66acIDw/H7t278dZbb2Hs2LH48ssv5W5amU2dOhX9+/dHnTp14OTkhMaNG2P8+PEYOHCg3E2ziNTUVACAv7+/Xrm/v79uXEWXnZ2NKVOm4LXXXquQX5amtWjRIjg6OmLs2LFyN8Vs6enpePjwIRYuXIhOnTrhl19+Qc+ePdGrVy/s379f7uZZDL9F1w6NHj0aiYmJOHDggNxNMcu1a9cwbtw47Nmzx27OgWo0GjRt2hQLFiwAADRu3BiJiYlYvXo1hg4dKnPrymbr1q3YsGEDNm7ciPr16yMhIQHjx49HtWrVKuw6PU3y8vLQt29fCCHw6aefyt2cMjtx4gSWLVuG+Ph4KBQKuZtjNo1GAwDo3r07JkyYAABo1KgRDh06hNWrV6Ndu3ZyNs9i2BNiZ9555x38+OOPiIuLQ/Xq1eVujllOnDiB9PR0PP/883B0dISjoyP279+P5cuXw9HREWq1Wu4mmiwwMBD16tXTK6tbt26Fvtp98uTJut6QiIgIDB48GBMmTLCb3quAgAAAQFpaml55WlqablxFpQ0gV65cwZ49eyp0L8hvv/2G9PR01KhRQ/d5ceXKFbz77rsIDQ2Vu3km8/X1haOjo919XhTGnhA7IYTAmDFj8O2332Lfvn0ICwuTu0lme/HFF3H69Gm9suHDh6NOnTqYMmUKlEqlTC0ru1atWhW5dfr8+fMICQmRqUXme/ToERwc9I9nlEql7kiuogsLC0NAQABiY2PRqFEjAEBmZiaOHj2Kt956S97GmUEbQC5cuIC4uDj4+PjI3SSzDB48uMh1Yx07dsTgwYMxfPhwmVpVds7OzmjWrJndfV4UxhBiJ0aPHo2NGzfi+++/h7u7u+5ctaenJ1xdXWVuXdm4u7sXuabFzc0NPj4+FfZalwkTJqBly5ZYsGAB+vbti2PHjmHNmjVYs2aN3E0rs65du2L+/PmoUaMG6tevj5MnT2LJkiUYMWKE3E0z2sOHD3Hx4kXd38nJyUhISIC3tzdq1KiB8ePH44MPPkB4eDjCwsIwY8YMVKtWDT169JCv0aUoaZ0CAwPRu3dvxMfH48cff4RardZ9Znh7e8PZ2VmuZpeotP9T4SDl5OSEgIAAPPvss7ZuqlFKW5/JkyejX79+aNu2LTp06IBdu3bhhx9+wL59++RrtKXJfXsOWQYAg8O6devkbppFVfRbdIUQ4ocffhDPPfecUKlUok6dOmLNmjVyN8ksmZmZYty4caJGjRrCxcVF1KxZU0yfPl3k5OTI3TSjxcXFGXz/DB06VAgh3aY7Y8YM4e/vL1QqlXjxxRdFUlKSvI0uRUnrlJycXOxnRlxcnNxNL1Zp/6fCyvstusasz9q1a0Xt2rWFi4uLaNiwofjuu+/ka7AVKISoQI81JCIiIrvBC1OJiIhIFgwhREREJAuGECIiIpIFQwgRERHJgiGEiIiIZMEQQkRERLJgCCEiIiJZMIQQERGRLBhCiOipsG/fPigUCty/f1/uphDR3xhCiIiISBYMIURERCQLhhAisgmNRoPo6GiEhYXB1dUVDRs2xPbt2wHknyrZuXMnGjRoABcXF/zjH/9AYmKi3jy++eYb1K9fHyqVCqGhoVi8eLHe+JycHEyZMgXBwcFQqVSoXbs21q5dq1fnxIkTaNq0KSpVqoSWLVsW+ap0IrIdhhAisono6Gh89dVXWL16Nf78809MmDABgwYNwv79+3V1Jk+ejMWLF+P48eOoWrUqunbtiry8PABSeOjbty/69++P06dPY/bs2ZgxYwa++OIL3fRDhgzBpk2bsHz5cpw9exafffYZKleurNeO6dOnY/Hixfj999/h6OiIESNG2GT9icgAub/Gl4jsX3Z2tqhUqZI4dOiQXvn//d//iddee033leabN2/Wjbtz545wdXUVW7ZsEUIIMWDAAPHSSy/pTT958mRRr149IYQQSUlJAoDYs2ePwTZol7F3715d2c6dOwUA8fjxY4usJxGZhj0hRGR1Fy9exKNHj/DSSy+hcuXKuuGrr77CX3/9pavXokUL3e/e3t549tlncfbsWQDA2bNn0apVK735tmrVChcuXIBarUZCQgKUSiXatWtXYlsaNGig+z0wMBAAkJ6ebvY6EpHpHOVuABHZv4cPHwIAdu7ciaCgIL1xKpVKL4iUlaurq1H1nJycdL8rFAoA0vUqRGR77AkhIqurV68eVCoVrl69itq1a+sNwcHBunpHjhzR/X7v3j2cP38edevWBQDUrVsXBw8e1JvvwYMH8cwzz0CpVCIiIgIajUbvGhMiKt/YE0JEVufu7o5JkyZhwoQJ0Gg0aN26NTIyMnDw4EF4eHggJCQEADB37lz4+PjA398f06dPh6+vL3r06AEAePfdd9GsWTPMmzcP/fr1w+HDh7FixQqsWrUKABAaGoqhQ4dixIgRWL58ORo2bIgrV64gPT0dffv2lWvViagEDCFEZBPz5s1D1apVER0djUuXLsHLywvPP/883nvvPd3pkIULF2LcuHG4cOECGjVqhB9++AHOzs4AgOeffx5bt27FzJkzMW/ePAQGBmLu3LkYNmyYbhmffvop3nvvPbz99tu4c+cOatSogffee0+O1SUiIyiEEELuRhDR023fvn3o0KED7t27By8vL7mbQ0Q2wmtCiIiISBYMIURERCQLno4hIiIiWbAnhIiIiGTBEEJERESyYAghIiIiWTCEEBERkSwYQoiIiEgWDCFEREQkC4YQIiIikgVDCBEREcni/wGkmC7utmbaSQAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 600x400 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "<style>\n",
       "    /* background: */\n",
       "    progress::-webkit-progress-bar {background-color: #CDCDCD; width: 100%;}\n",
       "    progress {background-color: #CDCDCD;}\n",
       "\n",
       "    /* value: */\n",
       "    progress::-webkit-progress-value {background-color: #00BFFF  !important;}\n",
       "    progress::-moz-progress-bar {background-color: #00BFFF  !important;}\n",
       "    progress {color: #00BFFF ;}\n",
       "\n",
       "    /* optional */\n",
       "    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
       "        background: #000000;\n",
       "    }\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "      <progress value='17' class='progress-bar-interrupted' max='50' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      34.00% [17/50 08:30<16:31][earlystopping]\n",
       "      <br>\n",
       "      \n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0;31m<<<<<< val_f1 without improvement in 5 epoch,early stopping >>>>>>\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>train_precision</th>\n",
       "      <th>train_recall</th>\n",
       "      <th>train_f1</th>\n",
       "      <th>lr</th>\n",
       "      <th>val_loss</th>\n",
       "      <th>val_precision</th>\n",
       "      <th>val_recall</th>\n",
       "      <th>val_f1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.280753</td>\n",
       "      <td>0.661162</td>\n",
       "      <td>0.708562</td>\n",
       "      <td>0.678469</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.233344</td>\n",
       "      <td>0.697657</td>\n",
       "      <td>0.763197</td>\n",
       "      <td>0.723983</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.280068</td>\n",
       "      <td>0.661054</td>\n",
       "      <td>0.707853</td>\n",
       "      <td>0.678095</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.232841</td>\n",
       "      <td>0.698279</td>\n",
       "      <td>0.763289</td>\n",
       "      <td>0.724314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.278366</td>\n",
       "      <td>0.663814</td>\n",
       "      <td>0.710813</td>\n",
       "      <td>0.681237</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.232643</td>\n",
       "      <td>0.697586</td>\n",
       "      <td>0.763988</td>\n",
       "      <td>0.724272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>0.278720</td>\n",
       "      <td>0.663917</td>\n",
       "      <td>0.709843</td>\n",
       "      <td>0.680477</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.232322</td>\n",
       "      <td>0.698723</td>\n",
       "      <td>0.763469</td>\n",
       "      <td>0.724655</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0.279467</td>\n",
       "      <td>0.662568</td>\n",
       "      <td>0.710921</td>\n",
       "      <td>0.680237</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.232189</td>\n",
       "      <td>0.697799</td>\n",
       "      <td>0.764892</td>\n",
       "      <td>0.724802</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>0.278937</td>\n",
       "      <td>0.662431</td>\n",
       "      <td>0.710440</td>\n",
       "      <td>0.679740</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.232076</td>\n",
       "      <td>0.697726</td>\n",
       "      <td>0.764980</td>\n",
       "      <td>0.724800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>0.277210</td>\n",
       "      <td>0.663949</td>\n",
       "      <td>0.710058</td>\n",
       "      <td>0.680680</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231985</td>\n",
       "      <td>0.697693</td>\n",
       "      <td>0.764887</td>\n",
       "      <td>0.724733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>0.278545</td>\n",
       "      <td>0.663654</td>\n",
       "      <td>0.711216</td>\n",
       "      <td>0.681124</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231924</td>\n",
       "      <td>0.697893</td>\n",
       "      <td>0.765281</td>\n",
       "      <td>0.725040</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>0.278285</td>\n",
       "      <td>0.662669</td>\n",
       "      <td>0.708992</td>\n",
       "      <td>0.679494</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231828</td>\n",
       "      <td>0.698353</td>\n",
       "      <td>0.765430</td>\n",
       "      <td>0.725370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>0.277281</td>\n",
       "      <td>0.663959</td>\n",
       "      <td>0.710993</td>\n",
       "      <td>0.681228</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231806</td>\n",
       "      <td>0.697125</td>\n",
       "      <td>0.766040</td>\n",
       "      <td>0.724946</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>11</td>\n",
       "      <td>0.278197</td>\n",
       "      <td>0.663533</td>\n",
       "      <td>0.710416</td>\n",
       "      <td>0.680969</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231684</td>\n",
       "      <td>0.698009</td>\n",
       "      <td>0.765550</td>\n",
       "      <td>0.725177</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>12</td>\n",
       "      <td>0.278732</td>\n",
       "      <td>0.662374</td>\n",
       "      <td>0.710190</td>\n",
       "      <td>0.680012</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231538</td>\n",
       "      <td>0.698657</td>\n",
       "      <td>0.765585</td>\n",
       "      <td>0.725568</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>0.277496</td>\n",
       "      <td>0.664046</td>\n",
       "      <td>0.711030</td>\n",
       "      <td>0.681418</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231492</td>\n",
       "      <td>0.698746</td>\n",
       "      <td>0.765299</td>\n",
       "      <td>0.725462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>0.277513</td>\n",
       "      <td>0.663730</td>\n",
       "      <td>0.710578</td>\n",
       "      <td>0.680840</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231564</td>\n",
       "      <td>0.697596</td>\n",
       "      <td>0.765870</td>\n",
       "      <td>0.725103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>15</td>\n",
       "      <td>0.276898</td>\n",
       "      <td>0.663613</td>\n",
       "      <td>0.711514</td>\n",
       "      <td>0.681343</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231475</td>\n",
       "      <td>0.697581</td>\n",
       "      <td>0.765810</td>\n",
       "      <td>0.725047</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>16</td>\n",
       "      <td>0.277615</td>\n",
       "      <td>0.663486</td>\n",
       "      <td>0.711608</td>\n",
       "      <td>0.681136</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231406</td>\n",
       "      <td>0.698154</td>\n",
       "      <td>0.765119</td>\n",
       "      <td>0.725006</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>17</td>\n",
       "      <td>0.277530</td>\n",
       "      <td>0.662574</td>\n",
       "      <td>0.710733</td>\n",
       "      <td>0.680074</td>\n",
       "      <td>0.00003</td>\n",
       "      <td>0.231329</td>\n",
       "      <td>0.698158</td>\n",
       "      <td>0.765239</td>\n",
       "      <td>0.725082</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    epoch  train_loss  train_precision  train_recall  train_f1       lr  \\\n",
       "0       1    0.280753         0.661162      0.708562  0.678469  0.00003   \n",
       "1       2    0.280068         0.661054      0.707853  0.678095  0.00003   \n",
       "2       3    0.278366         0.663814      0.710813  0.681237  0.00003   \n",
       "3       4    0.278720         0.663917      0.709843  0.680477  0.00003   \n",
       "4       5    0.279467         0.662568      0.710921  0.680237  0.00003   \n",
       "5       6    0.278937         0.662431      0.710440  0.679740  0.00003   \n",
       "6       7    0.277210         0.663949      0.710058  0.680680  0.00003   \n",
       "7       8    0.278545         0.663654      0.711216  0.681124  0.00003   \n",
       "8       9    0.278285         0.662669      0.708992  0.679494  0.00003   \n",
       "9      10    0.277281         0.663959      0.710993  0.681228  0.00003   \n",
       "10     11    0.278197         0.663533      0.710416  0.680969  0.00003   \n",
       "11     12    0.278732         0.662374      0.710190  0.680012  0.00003   \n",
       "12     13    0.277496         0.664046      0.711030  0.681418  0.00003   \n",
       "13     14    0.277513         0.663730      0.710578  0.680840  0.00003   \n",
       "14     15    0.276898         0.663613      0.711514  0.681343  0.00003   \n",
       "15     16    0.277615         0.663486      0.711608  0.681136  0.00003   \n",
       "16     17    0.277530         0.662574      0.710733  0.680074  0.00003   \n",
       "\n",
       "    val_loss  val_precision  val_recall    val_f1  \n",
       "0   0.233344       0.697657    0.763197  0.723983  \n",
       "1   0.232841       0.698279    0.763289  0.724314  \n",
       "2   0.232643       0.697586    0.763988  0.724272  \n",
       "3   0.232322       0.698723    0.763469  0.724655  \n",
       "4   0.232189       0.697799    0.764892  0.724802  \n",
       "5   0.232076       0.697726    0.764980  0.724800  \n",
       "6   0.231985       0.697693    0.764887  0.724733  \n",
       "7   0.231924       0.697893    0.765281  0.725040  \n",
       "8   0.231828       0.698353    0.765430  0.725370  \n",
       "9   0.231806       0.697125    0.766040  0.724946  \n",
       "10  0.231684       0.698009    0.765550  0.725177  \n",
       "11  0.231538       0.698657    0.765585  0.725568  \n",
       "12  0.231492       0.698746    0.765299  0.725462  \n",
       "13  0.231564       0.697596    0.765870  0.725103  \n",
       "14  0.231475       0.697581    0.765810  0.725047  \n",
       "15  0.231406       0.698154    0.765119  0.725006  \n",
       "16  0.231329       0.698158    0.765239  0.725082  "
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "keras_model.fit(\n",
    "    train_data = dl_train,\n",
    "    val_data= dl_val,\n",
    "    ckpt_path='bert_ner.pt',\n",
    "    epochs=50,\n",
    "    patience=5,\n",
    "    monitor=\"val_f1\", \n",
    "    mode=\"max\",\n",
    "    plot = True,\n",
    "    wandb = False,\n",
    "    quiet = True\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94c10e97-1da6-4f84-8754-d35768d0de2b",
   "metadata": {},
   "source": [
    "## 四，评估模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "id": "fa3159ca-7fda-4d45-8500-834edc5b9c1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchmetrics import Accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "id": "801cc5d8-eb68-495c-a54c-ea256bc02e47",
   "metadata": {},
   "outputs": [],
   "source": [
    "acc = Accuracy(task='multiclass',num_classes=21)\n",
    "acc = keras_model.accelerator.prepare(acc)\n",
    "\n",
    "dl_test = keras_model.accelerator.prepare(dl_val)\n",
    "net = keras_model.accelerator.prepare(net)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "id": "d4579ed5-e45c-4757-80ed-1a2207a59991",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1344/1344 [00:12<00:00, 107.75it/s]\n"
     ]
    }
   ],
   "source": [
    "\n",
    "from tqdm import tqdm \n",
    "for batch in tqdm(dl_test):\n",
    "    with torch.no_grad():\n",
    "        outputs = net(**batch)\n",
    "        \n",
    "    labels = batch['labels']\n",
    "    labels[labels<0]=0\n",
    "    #preds\n",
    "    preds =(outputs.logits).argmax(axis=2) \n",
    "    acc.update(preds,labels)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "id": "9cf4bbad-d2f4-431b-a614-419e234520c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(0.9178, device='cuda:0')"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acc.compute()  #这里的acc包括了 ’O‘的分类结果，存在高估。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ff948f5-4356-47b4-b18b-2f02bb66b47f",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a87aed1-34d6-4588-bdd7-a8038892ed70",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "904f651f-96e1-46ad-8c7f-3e4fdcf81218",
   "metadata": {},
   "source": [
    "## 五，使用模型"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c07e4869-40fe-4520-8fef-86a302684def",
   "metadata": {},
   "source": [
    "我们可以使用pipeline来串起整个预测流程.\n",
    "\n",
    "注意我们这里使用内置的'simple'这个aggregation_strategy，\n",
    "\n",
    "把应该归并的token自动归并成一个entity.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "id": "24cf1164-eb6b-48be-b148-c29fde546986",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "id": "933aacbe-cde1-49d7-9d06-baf563385e4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "recognizer = pipeline(\"token-classification\", \n",
    "                      model=net, tokenizer=tokenizer, aggregation_strategy='simple')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "id": "c05d9178-2395-4498-8353-8dd0b9e3b1e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "net.to('cpu');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "id": "39083269-fe16-4cdb-a46a-efc14de9e9a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'entity_group': 'name',\n",
       "  'score': 0.6913842,\n",
       "  'word': '小 明',\n",
       "  'start': None,\n",
       "  'end': None},\n",
       " {'entity_group': 'name',\n",
       "  'score': 0.58951116,\n",
       "  'word': '小 红',\n",
       "  'start': None,\n",
       "  'end': None},\n",
       " {'entity_group': 'name',\n",
       "  'score': 0.74060774,\n",
       "  'word': '安 利',\n",
       "  'start': None,\n",
       "  'end': None}]"
      ]
     },
     "execution_count": 169,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recognizer('小明对小红说，“你听说过安利吗？”')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d689cb7-83ac-45b1-8dd8-1d54b7cd6170",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "e51d248d-f73f-40e9-9a36-b299b1d9c0e9",
   "metadata": {},
   "source": [
    "## 六，保存模型"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63f5f9f1-b230-4518-8412-de71aa1da45f",
   "metadata": {},
   "source": [
    "保存model和tokenizer之后，我们可以用一个pipeline加载，并进行批量预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "id": "22913dd2-869b-49f8-aad8-0f965706c539",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('ner_bert/tokenizer_config.json',\n",
       " 'ner_bert/special_tokens_map.json',\n",
       " 'ner_bert/vocab.txt',\n",
       " 'ner_bert/added_tokens.json')"
      ]
     },
     "execution_count": 170,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net.save_pretrained(\"ner_bert\")\n",
    "tokenizer.save_pretrained(\"ner_bert\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "id": "48bb9d26-27f5-4686-bc8d-d72813a8909b",
   "metadata": {},
   "outputs": [],
   "source": [
    "recognizer = pipeline(\"token-classification\", \n",
    "                      model=\"ner_bert\",\n",
    "                      aggregation_strategy='simple')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "id": "5c761d0d-d28b-4f17-91b0-38e82bd48f3f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'entity_group': 'name',\n",
       "  'score': 0.6913842,\n",
       "  'word': '小 明',\n",
       "  'start': 0,\n",
       "  'end': 2},\n",
       " {'entity_group': 'name',\n",
       "  'score': 0.58951116,\n",
       "  'word': '小 红',\n",
       "  'start': 3,\n",
       "  'end': 5},\n",
       " {'entity_group': 'name',\n",
       "  'score': 0.74060774,\n",
       "  'word': '安 利',\n",
       "  'start': 12,\n",
       "  'end': 14}]"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recognizer('小明对小红说，“你听说过安利吗？”')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "419fd868-e1cc-4c24-bf29-64fc9ecdc7e5",
   "metadata": {},
   "source": [
    "**如果本项目对你有所帮助，想鼓励一下作者，记得给本项目加一颗星星star⭐️，并分享给你的朋友们喔😊!** \n",
    "\n",
    "如果在torchkeras的使用中遇到问题，可以在项目中提交issue。\n",
    "\n",
    "如果想要获得更快的反馈或者与其他torchkeras用户小伙伴进行交流，\n",
    "\n",
    "可以在公众号算法美食屋后台回复关键字：**加群**。\n",
    "\n",
    "![](https://tva1.sinaimg.cn/large/e6c9d24egy1h41m2zugguj20k00b9q46.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
